DCCOMENl BISCKE 



2D 15a C22 ^: CO7 C35 

A^^-'-^' ^^'O^t^i, raymcna L • ; tct*^::^^ teiaci. , 111 

T^'^'^-'^ c-.l^'i^l ieqisiaticn l€titiiL^ Ltcinir^ D is a l 1: 1 1 J e 

AirNCY 5iirt=:ciu ot Educatioh ili ZLt- i- d cic at {. 0 (EhiiW/cE), 

FOd DAIr Ajq 77 

Co Mr Ac . tC 1c t 

G FAN 'I i;^ J 7 6u 5^2 J 

^^^^^ l^p.; tdpti Frfcs.r.r.ttd at th^ Arrtdl ^€eti^g cjt tnc 

Am^riican I-s ycholoqicai /s^rcc jat icr (e5th, Sau 
rrcirciscc, Califcrrja^ August 2e--jC^ 1S77) 

LiSLP^P^o?:; Aqc Lirxerercec; Edi^catjcial ieci^latici.; £it=iiient ar y 

^Tticcndary Education; P^ctial Le^isleticr; 

* Iiittiligence QuctieLt; ♦! r telliqence Tests; 

♦ Leair.m:; DisdLilities J Fredactive Validity; studtat 
Pxac^:ale^t ; Tfst Eias; ♦lest BeliatiJity; *lruc 

oc jr es 

iDi N III I ; *r-c::t r-tt-;.-. t rf-llatility 

Aii o I : Av. 1 

iuci-ic Ldw "r.a-li^^ aarcates the ade r ti^ jicc t iCi. and 
p^-i'CrrKe fi t it-aznjLLq disaoltd childier tasta F^'^a^ciily cr XL^ 
m-ct.^ureoi^:.- 01 11. t c J i iq e nce • It xs^ tUieicie, the le spcn s 1 tiii t y ci 
tiiduCd ti^r ^i.^ycnoicqii:?t^ tc use standardized irtellicence tests 
approprid^ ei\,tc accurately and objectively assess a chalcs' 
i::tellectudl |LJtr.:.tial and ability. Three qererel asscisfticns 
iwaderlymq -r.f- aieasureaient ct i r. te 11 1 9 e rce aze discussed: (1) that 
int-iliq^rr.ce is Dorn aieasuracle and q u ar t i 1 1 a rl € ; (2) that iz is 
difetricated accoiiii.^ tc a ncrical curve; ard (J) that int€lliqenct 
leuiAiLs constant over tine. The ccnstarcy cf irtelligence level and 
the stdDiiity CI itL rn^asur ement (test reliatilityj are iufcrtant 
issue^^. Test rfeli^Lxiity aiay te estat J isned acccidicg tc internal 
ccnsist^ rc y ; c ^-.xi-^la ticii witn equivalent cr par.dllel tests; cr^ iccst 
i^por^ciri tly ^ ccnr.stercy cf Eeasureaert cvei tine. Cata cr the 
stdhiii^y 01 certain irttiiigence tests ever tiae are presented. 
Inest. iia:.t^vrvl lat^ suqqest that the sccres ci ycung chilcier are less 
sti:,le -hdfi M clJ^r children and adults; that stability decreases ab 
tn- Icnqtn tn^v t-s t- r te st interval irjcreasts; and thct children 
wit:i v^r^zu-^ c^sctalities exhibit acre test-retest variation, A 
tornu^a 1. pi^spiit^a lor estica r^iii.c an unbiased true score, Mere 
resfeaicn .c. neeo-'i ir.wstiqatinq tair 'netncos ci placiLc Jeernirg 



* ti^.rronuu^ i^. sapplit.c by PJORS are the best that can te aade * 

* ' ftcffi tne criqarai dcccmert, * 

**4^*44**^t,4'^44^^t444444**444^4444444i^i444444*4^444444^4^*4444^4^4»^:^^44^^4^4^ 

o 

ERLC 



rvl 
o 

I — I 



FEDERAL LEGISLATION DEFINLNG 
LEARNING DISABILITIES AND 
BIASED IQ SCORES* 



v> S OCPARTMCNT OF MCALYH 
E Due A710N « 
NAUONAU INSTlTuTC OF 
EDUCATION 



-S M F ' f V e 0 ^ HOV 
' , E A C^R OP N<QNS 



•^{Pf^nri t THIS 



LUfibsi^ 

' ''^ ^ t''^" > I H'\ HP ' OUHt E b 



A paper selected for presentation 
at the American Psychological 
Association Annual Convention, 
Division #16 "Poster Session", 
San Francisco, August, 1977, 



Raymond E. Webster 
Herman M. Bates, III 

Department of Educational Psychology 

University of Connecticut 

Box 

Storrs, Connecticut 06268 



CO 

o 

preparation of this paper was aided in part by a grant from the Bureau of 
^ Education for the Handicapped, USOE, "Learning Disabilities in Mathematics' 

I.^ A CurrlcuJum Design for Upper Grades", G007605223, //443CH60166, under the 

O direction of J. F. Cawley, Project Director, University of Connecticut. ' 
Storrs, Connecticut 06268 

© 

o 2 
ERIC 



FEDERAL LEGISLATION DEFINING LEARNING 
DISABILITIES AND BIASED IQ SCORES 

In recent years the placement of children solely on the basis of an IQ 
score, or a battery of tests purportedly measuring IQ, is a measurement 
procedure that has been seriously questioned. Placements based upon this 
prediction model are generally administratively beneficial* However, incorrect 
decisions can often be damaging to the individual child involved. 

Section S of the Proposed Rules on Specific Learning Disabilities issued 
by the Office of Education of the Department of Health, Education and Welfare on 
November 29, 1976 (referred to as Public Law 94-142) coerces the prediction and 
placement of children primarily on the basis of measurement of intelligence* 

As such, it is the responsibility of school and educational psycholgists 
to use standardized intelligence tests in the most appropriate manner to 
accurately and objectively assess a child's intellectual potential and ability. 

There are a number of theoretical and general assumptions underlying the 
measurement of intelligence. Perhaps the moat obvious is that the global entity 
intelligence is both measurable and quantifiable. Aside from the problems 
caused by the lack of a common operational definition for intelligence, this 
assumption tends to ignore individual fluctuations in capacity to deal with day 
to day situations, individual njDod swings, transient physical disabilities, and 
most importantly, socio-economic and cultural background differences. 

A second assumption, and one that can neither be proven nor disproven, is 
that intelligence within the general population is normally distributed according 
to the Gaussian Curve. Dlngman and Tarjan (1960) have collected data that 
seriously questions thii? assumption. They calculated the expected number of 
people at various low IQ levels according to the normal curve. They then compared 
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these estimates to the population of the United States, 210 million. Large 
discrepancies were found between predicted estiniates from the Gaussian Curve 
and actual numbers. For example, in the 0 to 20 IQ range the predicted number 
of cases is 57. Dingman and Tarjan found the actual prevalence of cases in this 
IQ range to be 104,935. 

Another extremely important assumption, and one that underlies both the 
measurement of intelligence and the prediction-placement model, is that intelli- 
gence remains constant over time. Certainly, this topic has been a recurring 
issue in recent years. Schaie (1974) has argued that the results from research 

"combining the rigor of the scientific method with an address 
to problems that may indeed be of social consequence" now -Indicate 
that a presumed decline in adult intelligence is at best ,i 
methodological artifact and at worst a popular mlsunderstending 
of the relation between individual development and soclcultural 
change." (p. 802) 

Baltes and Schaie (1974) and Schaie, Labouvie-Vief and Barrett (1973) suggest 
that the decrement hypoth«sis is based on i^oor psychometric or biomedical models. 
This viewpoint has been hotly contested by Horn and Donaldson (1976, 1577). Although 
the crux of this argument has focused on the decline of intelligence related to old age, 
the constancy position of intelligence may be criticized because It Ignores individual 
variability related to the physical and psychological development of esp'eclally the 
late-developing child. Binet himself (quoted by Skeels and Dye, 1939) commented 
that: 

Some recent philosophers appear to have given their moral support 
to the deplorable verdict that the intelligence of an individual 
is a fixed quantity, a quantity which cannot be augmented. We 
must protest and act against this brutal pessimism. We shall 

endeavour to show that it has no foundation whatsoever A child's 

mind is like a field for which an expert farmer has advised a 
change in the method of cultivating, with the result that in 
place of desert land we now have a harvest. T. is in this partic- 
ular sense, the only one that Is significant, that we say that the 
*^ 
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intelligence ot children may be increased. One increases that 
which constitutes the intelligence of a school child; namely the 
capacity to learn, to improve with instruction. 

Despite these arguments, child development textbooks continue to implicitly 
promulgate the idea that intelligence is a temporally stabilized phenomenon. 
Clarke and Clarke (1953) examined a paper by Nemzek (1933) and another by Thorndike 
(1940), Together the two papers reviewed a total of 359 studies of intelligence 
measurement carried out before 1940. Clarke and Clarke (1953) concluded that 
(1) the predictive value of the IQ as measured by test-retest correlations 
decreases as the interval between testings increases; (2) although the average 
IQ of the population may not significantly change, some individuals exhibit signif- 
icant variability in IQ measurement; (3) intelligence tests given to children before 
entering school have little value in predicting later achievement; and (4) mental 

assessments for infants have no predictive relevance in later years. Again, the 

constancy of IQ measurement is questioned, 

A complementary issue to the question of IQ constancy is the question of 

measurement constancy. This phenomenon is referred to as test-ratest reliability, 

or stability ot measurement over time. 

The basic theoretical issue in the .stability of Intelligence assessment :ls 

the notion of the existence and ineasurability of a "true" score. This "true" score 

is associated with an individual's obtained score on an IQ test. The classical 

theory of reliability offers three alternative ways of defining a "true" score 

t 

(Lord & Novick, 1968, pp, 28-29). The first potion, referred to by Sutcliffe 
(1965) as the Platonic "true" score, suggests that a "true" score exists for each 
observation. This score is not observable because it is obscured by measurement 
errors. A major objection to this position is that the "true" score can never be 



measured or quantified (Thorndike, 1964). A more widely acceptea position is 
that the "true" score la a probabilistic entity. In principle, if one were to make 
aq infinite number of observations of some attribute, the mean number of these 
observations would converge toward a constant or "trua" score. The final definition 

and perhaps the most mathematically rigorous, suggests that " corresponding true 

and error scores are uncorrelat^h and that error scores on different measures are al 

uncorrelated " (Lotd and Novick, 1968, p. 29). The binding element of the three 

definitions is the existence of an exact, single "true" score. Accurate measurement 
of this score represents perfect reliability. As the discrepancy between the 
"true" score and an obtained score increases the instrument's reliability 
decreases. 

There are three ways in which to measure or establish reliability. The 
most common method refers to the internal consistency of an instrument. This 
may be computed by using the procedures outlined by Kuder and Richardson <1937). 
•Reliabi'lity can also be established through the process of equivalence, where 
the parallel form of a teat is administered at the same time or a short time 
after, and then correlated with the instrument in question* The final form of 
reliability, and perhaps the mosi: important when dealing with intelligence 
measurement, refers to the consistency or stability of measurement over time. 
In this procedure the same form of an instrument is administered two (or more) 
times to the same sample population separated by varying intervals of time. In 
the case of intelligence tests, this period of time should extend over a minimum 
of months, and ideally, years. This is especially important if one subscribes to 
the IQ constancy paradigm, a position Implicitly accepted in most public schools, 
mental health clinics and hospitals. 



Thfe Initial reports of reliability for the WISC (Wechsler, 1949) were 
established through a split-half reliability procedure, a measure of internal 
consistency and not of stability of measurement over time. More recently for 
the WISC-R (Wechsler, 1974) stability coefficients for periods of three to five 
weeks are reported. However, the instrument's stability over longer periods of 
time has not been systematically investigated. The 1960 revision of the Stanford- 
Blnet reports no test-reteet reliability coefficients. There remains a serious lack 
of research examining the stability of all general intelligence measures over extended 
intervals of time (three months and longer), with various diagnostic groups of 
children. Table 1 contains the results or 12 studies examining the stability of 
the WPPSI. Wise. WAIS. and the Stanf ord-Binet forms L and M found through a review 
of the research literature. No studies examining the stability of WISC-R scores 



were found. 



TABLE 1 ABOUT HERE 



Inspection of the table shovs thr.t chere has been a paucity of research 
Investigating the stability of intelligence tests over time. Thus, it is difficult 
to determine general trends and patterns throughout the research. However, it may 
be concluded that (1) the scores of young children are less stable than those of 
older children and adults; (2) as the length of the test-retest interval increases, 
the stability coefficient decreases: and (3) children who exhibit different types 
of disabilities show more test-retest variation. Eysenck (1953) has shov^ that 
the test-retest correlation coefficient for lar-e groups decreases steadily at 
the rate of about 0.04 a year. If one assumes that the average test-retest relia- 
bility coefficient after immediate retestlng is around .90 (Thorndika, 1933, 1940) 
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then after sevea years a test-retest coefficient of around .62 is expected. 
Subsequently, only 38.44% of the total variation is" attributable specifically 
to the intelligence measure. This suggests that if intelligence scores are to 
be used- for prediction and placement then children^with specific disabilities 
should be frequently re-evaluated. Frequent ra-assessment is' even more important 
for the younger disabled child because IQ estimates may be confounded by both 
developmental factors and the nature and extent of the disability. There is ^. 
definite need far more systematic research on the stability of IQ scores as a 
function of chronological age and of emotional and learning disabilities. 

As the test-retest reliability of an instrument decreases, the estimated 
true score shows a greater tendency to regress toward the mean. The most serious 
implications of this regression are for extreme scores cr those that lie farthest 
from the mean. 

The obtained score on an intelligence test with less than perfect reliability 
represents an outwardly biased estimate of the child's "true" score and ability. 
Thus, for low IQ scores, the estimated "true" scores tend to be higher, and for 
high IQ scores they tend to be lower. 

An unbiased estimate or "true" score for a child can be computed by using 
the following formula suggested by Nunnally (1967, pp. 220-221): 

" '^xlx2^ 

" the estimated unbiased deviation score; 
'^xlx2 " reliability of the test used; and 

X - the deviation score (the obtained score minus the mean) 

The "true" score or best unbiased estimate of IQ « 
mean of the test instrument (usually 100) + 



This formula yields the best unbiased estimate about which confidence intervals 
should be established • This unbiased estimate is al'so the fairest number (to the 
child) to be used in the optional "severe discrepancy" formula or as the basis in 
determining if a child meets the 50Z discrepancy between ability and achievement. 

The following example shows how the above formula might be used. Coleman 
(1963) found that the test-retest reliability coefficient for the ^'ull Scale IQ 
score of the WISC for learning disabled children of age 7.5 years is .77, with a 
standard error of measurement of 8.61. A child with an obtained IQ score of 80 
would have a deviation score (X) of -20, and an estimated unbiased "true" IQ score 
of 84.6, or rounded to 85. The confidence interval for the 95% level of confidence 
is 85± 16.88. The asymmetrical nature of the confidence interval around the 
observed IQ score of 80 serves as a reminder of the biased nature of the obtained 
score as a function of the error intrinsic to the instrument. 

Concluding Remarks 

There is a great need for continued systematic research Investigating t-he 
stability of intelligence test measurements for various diagnostic groups. The 
procedures used to place children into special classes, programs or institutional 
settings should be modified until empirical research confirms and/or defines the 
parameters of predictive efficacy of these tests. An unbiased estimate of an IQ 
score should probably be computed and used in these modified placement procedures 
in an effort to be fair to each child. 

To clarify the definition of learning disability, the federal government 
has perpetuated a measurement dilemma. The optional formula to predict whether 
a child is severely discrepant between ability and achievement demands the 
determination of the child's "true" IQ. This situation becomes even more complicated 
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when one considers that at least two measures of intelligence must be used to 
determine the IQ score. 

It is proposed that the best unbiased fcstimation be computed and used as a 
fairer estimate of a child's true intellectual ability and potential. 
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Stability Coefflcleuts for Widely Used Intelligence Measures 



Instrument 


1 

Sample Characterlst lcr> 


Test-Retest 
Interval 


Stability r 


1 ■■ — 

Authors 


Wise 


n « 39, IQ range 40-79 
Institutionalized MR 
Age Kange ll-U : lA-Il 


3-4 mos. 


Verbal » ,92 
Perfor. » .89 
FSIQ •« .95 


Throne, Schulman & 
Kasper (1962) 


Wise 


n 26, anti-social and 

habit disordered 

Age. /-/.li>-l, X~j2-1 


b mos. 


Verbal .81 
Perfor. « .73 
FSIQ « .80 


Turner, Mathews L 
Rachman (1967) 


wise 


mentally retarded 




Verbal ,48 
Perfor. « .78 
FSIQ - ,88 


Friedman (1970) 


wise 


n « 24 males; learning 

disabled 

X IQ - 102 




Verbal - .62 
Perfor. « .81 
FSIQ « . 77 


Coleman (1963) 


wise 


* 2 1 i 11 ma 1 e s and 
10 females; emotional- 
ly disturbed chiMrcn 


2 to 15 mos . 
X *= 7.8 mos • 


Verbal «= .819 
Perfor. « .508 

FSIQ .834 


Tigay & Kempler (1971) 


wise 

StflTif nrH — 

Binct 


IQ «= 55 to 75 
cm iu rcTi naa no sen- 
sory or behavioral 
problens 


3 yrs. 


Verbal « .70 
Perfor. «= ,73 
FSiq^ « . 76 
SBIQ « .79 


Walker Gross (1970) 


Wise 

Blnct 


n « 60; "noiruil" chil- 
arcn, tested In 5th 
and 9th grades 


4 yrs 


Verbal i= .77 
Perfor. ,74 
FSiq_« ,77 
SBIQ « .78 


Gehman & Matyas (1956) 


Stanf ord- 
Bioec 


n « 182 luentally re- 
tarded students 
IQ - 42 to 89 
a^e ranr.e: 6 to 15 yrs. 




SBIQ -= .93 


Callman L Newlyn 
/I o «: Q \ 


Stanf ord- 
Dine \. 


"norr^al" subjects agea 
Z yrs. to 18 yrs. 


3 yrs 


age range r 

2- 5: .32+ .06 

3- 6: .57+ .05 

4- 7: .59+ .04 

5- 8: .70 
7-10: ,78 
9-12/13: .85 
14/15-18: .79 


Honzik, MacTarlane 
& Allen (1948) 


Stanford- 
Blnet 


n « ill "normals'' 
first tested as 
children 


10 yrs. 
15 yrs. 
25 yrs. 


IB IQ_^ 

1931 to 41 .65 
1941 to 56 .85 
1931 to 56 .59 


Bradway & Thoirpson 
(1962) 


Wise 

WATS 


Wiser n » 46 mentally 
retarded children; 

HAIS: n 130 mentally 
retarded adults 


Wise X - 

33 mos. 

WAIS X - 
29.5 mos. - 


^WI SC 

VeVbaT* .70 
Perfor. «^ .72 
FSIO « 81 
WAIS 
Verbal'"- .87 
Perfor. «= .92 
FSIQ = .88 


Rosen, Stallings, Floor 
& Nowakiwska (1966) 


WPPSI 


n " 50 "nomal" 5 yr . 
olds 


3 mos . 


FSIQ - .9? 


Oldridce 6 Allison 
(1968) 
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